ESTIMATING  THE  INFORMATION  PARAMETER  OF  A TWO-WAY  TABLE  WITH  AP— ETC(U) 
JAN  TB  W H DUMOUCHEL*  N ODEN  N00014-75-C-05S5 

UNCLASSIFIED  TR-10  M. 


Kl 


00 

no 

o 


ESTIMATING  THE  INFORMATION  PARAMETER  OF  A TWO-WAY  TABLE 
WITH  APPLICATIONS  TO  ANIMAL  COMMUNICATION 


BY 

WILLIAM  H.  Dumouchel 
MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY 


NEAL  ODEN 

UNIVERSITY  OF  MICHIGAN 


TECHNICAL  REPORT  NO.  1 0' 
JANUARY  30,  1978 


PREPARED  UNDER  CONTRACT 


nME)f?nili21Z 

UJ  APR  10  1978 

Jlkl^ElJlTlS 

DEPARTMENT  OF  MATHEMATICS 
MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY 
CAMBRIDGE,  MASSACHUSETTS 


N00014-75-C-0555  (NR-042-331) 
FOR  THE  OFFICE  OF  NAVAL  RESEARCH 


I blsfRlBfhTO}^  STATEMENT  A 


Appiorad  far  public  iuImm; 
Oatxlbattaci  UnKmitud 


ESTIMATING  THE  INFORMATION  PARAMETER  OF  A TWO-WAY  TABLE 
WITH  APPLICATIONS  TO  ANIMAL  COMMUNICATION 

by 

William  H . DuMouchel 
Neal  Oden  (University  of  Michigan) 

Technical  Report  No.  10 
January  30,  1978 


Prepared  Under  Contract 
N00014-75-C-0555  (NR-042-331) 

For  the  Office  of  Naval  Research 

Reproduction  in  Whole  or  in  Part  is  Permitted 
for  any  purpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited 

DEPARTMENT  OF  MATHEMATICS 
MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY 
CAMBRIDGE,  MASSACHUSETTS 


1.  Introduction 


Recently  a niamber  of  studies  of  animal  behavior  (Hazlett  and  Bossert, 

1965,  1966;  Dingle,  1969,  1972;  Altmann,  1965)  have  been  concerned  with 

estimating  the  dependence  between  the  successive  acts  of  two  communicating 

animals.  If  one  is  given  p = (p.. ) , the  probability  that  act  type 

J ^ 

k follows  act  type  j , j = 1,...,J  ; k * 1,...,K  where  the  first 
act  has  J possibilities  and  the  second  act  has  K possible  types, 

(often  J = K)  , a measure  of  dependence  between  successive  acts  is: 


I(P)  = I I Pjkl°82Sk^Pj.P.k>  • 


where  p indicates  summation  with  respect  to  k , and  similarly  for 

J • 

p.k  • 

If  communication  is  equivalent  to  dependence,  this  quantity  may  be 
(and  has  been)  construed  to  measure  the  amount  of  "information"  trans- 
mitted from  one  animal  to  another.  Other  workers  have  used  I or  related 
measures  to  assess  the  Information  transmitted  by  the  waggle  dance  of 
honey  bees  (Haldane  and  Spurway,  1954)  and  the  odor  trail  of  fire  ants 
(Wlloon,  1962),  and  to  measure  dependence  between  successive  phrases  in 
tbj  call  of  the  wood  pewee  (Chatfield  and  Lemon,  1970).  A good  review 
of  the  pitfalls  Involved  in  the  measure  of  Information  transfer  is  tor 

given  by  Cronbach  (1955)  . 


Whitt  Section  ff 
Buff  Secfion  □ 
INCEO  □ 

In  applying  this  information  measure,  workers  in  animal  behavior  ^tion 


OISTKieilTION//IVA[lASILirr  CODES  1 

Oist.  AVAIL,  and,,  or  SPtCIAL  1 

A 

usually  replace 

■“jk  '■>' 

A. 

Pjk  = 

where 

"ik 

to  follow  act 

observed  pairs  of  adjacent  acts. 


The  usefulness  of  I(p)  has  been  impaired  by  lack  of  knowledge  of 

its  distribution.  Although  a few  workers  (Chatfield  and  Lemon,  1970; 

Oden,  1977)  have  noted  that,  under  the  null  hypothesis  of  no  dependence, 

''  2 

I(p)  is  closely  related  to  G , the  likelihood  ratio  test  of  independence 

2 

in  a two-way  table  (and  is  therefore  roughly  related  to  Pearson's  X ) 
by  the  equation: 


= 2 n log^(njkn./nj.n  k)  . 

= 2n  log^2  I(p) 

= 1.39n  I(p)  , 

tests  of  alternate  hypotheses  have  not  been  performed.  Since  nonzero 
values  of  I are  of  interest,  for  example,  in  comparing  inter-act  de- 
pendence in  various  species,  or  in  the  same  species  under  various  treatments, 
a method  of  estimating  I should  be  of  use  to  behavioral  scientists. 

2 . The  Distribution  of  I(p)  when  Acts  are  Dependent. 

2 

For  convenience,  we  will  discuss  the  distribution  of  G rather  than 
I(p)  , and  define  the  analogous  parameter 


3. 


X = X(p)  = 2n  log^2  I(p) 
= 1.39n  I(p)  . 


When  n is  large  and  X = I = 0 (i.e.,  acts  are  independent),  it  is  well-known 

2 

(Wilks,  1938)  that  G has  approximately  a chi-squared  distribution  with 

2 

(j-l)(K-l)  degrees  of  freedom.  The  large  sample  distribution  of  G (and  thus 

of  I(p))  is  more  complicated  if  X (and  thus  l(p))  is  positive.  In  general, 

2 

the  distribution  of  G , even  for  n large,  depends  in  a complicated  way  upon 

the  p , which  are  usually  unknown.  See  Bishop,  Fienberg,  and  Holland  (1975, pp. 

3 

518-519)  for  a discussion  and  Broffit  and  Randles  (1977)  for  a related  discussion. 
However,  if  I(p)  is  near  0,  but  not  equal  to  0,  then  it  was  proved  by  Wald 
(1943,  Theorem  IX),  and  under  simpler  assumptions  by  Feder  (1968)  and  by 

2 

Davidson  and  Lever  (1970) , that  when  n is  large  the  distribution  of  G 
is  approximately  that  of  a non-central  chi-squared  variable,  with  (J-1) (K-1) 
degrees  of  freedom,  and  with  noncentrality  parameter  equal  to  X . (The  references 
cited  prove  this  for  likelihood  ratio  tests  in  general.  In  the  appendix  it  is  shown 
that  their  definition  of  the  non-centrality  parameter  reduces  to  our  definition 
of  X in  the  present  case.  See  Johnson  and  Kotz  (1970)  for  a discussion  of 
the  non-central  chi-squared  distribution,  which  has  mean  df  + X and  variance 
2df  + 4X  , if  Jf  is  the  degrees  of  freedom.)  The  meaning  of  the  phrase 
"I(p)  near  0 and  n large"  is  that  one  envisions  a sequence  of  alternative 
hypotheses  p,.  p.  P .»  and  sample  sizes  n -♦•  “>  , such  that  the  product 

j K J . * K 

nl(p),  or  equivalently  X , approaches  a fixed  number.  In  any  application, 

of  course,  there  is  only  one  value  of  n and  I(p),  and  the  question  then 

arises  of  how  closely  the  noncentral  chi-squared  distribution  approximates 
2 

that  of  G . 


4. 


The  adequacy  of  this  approximation  depends  on  two  factors: 

1)  the  size  of  npjj^,  the  expected  cell  frequencies,  and 

2)  |Pjj^“Pj.P  j^l  1 the  cell  deviations  from  independence. 

The  former  quantities  should  be  large;  the  latter,  small.  How 

large,  and  how  small,  depends  on  the  degree  of  approximation 

desired,  and  to  a lesser  extent,  on  the  other  characteristics 

of  the  Pjj^'s.  Many  rules  of  thumb  have  been  suggested  for  how 

large  the  expected  cell  sizes  should  be  for  chi-squared  tests 

of  contingency  tables  to  be  valid.  Most  authorities  would 

probably  agree  that  if  every  cell  in  the  observed  table  has 

at  least  2 observations,  the  approximations  are  practically 

adequate.  Also,  if  fewer  than  half  the  cells  have  at  least  two 

observat ions, it  is  certainly  wrong  to  use  this  theory.  In 

between  these  rather  broad  limits,  the  results  should  be 

interpreted  with  caution.  When  many  of  the  cell  sizes  are 

too  small,  the  low-frequency  types  of  acts  could  be  excluded,  or 

combined  if  they  group  themselves  into  natural  categories.  As 

for  the  size  of  Ip  - p P I , the  simulation  study  presented 

' jk  j . .k' 

in  Table  I suggests  that  if  the  cell  sizes  are  not  too  small 

and  if  every  |p  - p p | < .1,  the  noncentral  chi-squared 

' jk  j . . k'  - 

2 

approximation  to  the  distribution  of  G is  fairly  good  for  the 
larger  percentage  points,  though  not  so  good  for  the  lower  tail. 

Table  I presents  the  results  of  a computer  simulation  of  the 
distribution  of  G for  5 tables  (sets  Four  of  the 


CO 

3 

o 

•H 

4J 

o 

cd 

rH 

•K 

•K 

•K 

He 

He 

rH 

+1 

Cl 

vO 

00  o 

rH 

ps. 

Ov 

Ul 

ps 

00 

00 

<r 

3 

Ul 

rs. 

rs 

00  00 

vD 

00 

ps. 

Ov 

00 

ps. 

00 

§ 

ps» 

3v 

o> 

Ov  ov 

Ov 

OV 

Cv 

ov 

OV 

CTv 

Ov 

c^ 

•tH 

OV 

CO 

/TS 

o 

u 

O 

o 

o 

CM 

•K 

•K 

He 

He 

He 

o 

M 

+1 

Cl 

Ov 

p^  n 

Ov 

00 

CM 

sd- 

Cl 

<r 

00 

rH 

rH 

U 

C) 

O 

O 

3v  rH 

sd* 

rs 

OV 

ps 

vO 

Ul 

cyv 

rH 

C 

bi 

o 

ON 

ov 

00  ov 

00 

00 

00 

00 

C^ 

ov 

00 

o^ 

tH 

*3 

U 

/-S 

3 

CM 

r< 

*3 

cn 

•K 

•K  •)( 

•K 

He 

C 

+1 

Ul 

p^ 

ps*  ps. 

VO 

o\ 

CM 

CA 

Ov 

CM 

rH 

Ul 

O 

3 

o 

ON 

cn  so 

ps. 

o 

o 

Ov 

Cl 

00 

00 

O 

4J 

o 

Ul 

sd- 

sd-  ^ 

sd- 

Ul 

sd" 

«43- 

Ul 

sd" 

sd- 

Ul 

VM 

CO 

Ul 

M '3 

^1 

V 

M 

+ 1 

o| 

o 

•IC  HC 

* 

•K 

•K 

•K 

He 

•3 

^1' 

•4* 

00 

VO  Ul 

o> 

vO 

cn 

Ov 

Ul 

Cl 

Ul 

CO 

3 

o 

O 

o 

rs  rs 

00 

CA 

00 

00 

ps 

Ov 

Ov 

d) 

4J 

o 

rH 

rH 

CM 

rH 

rH 

rH 

B 

CJ 

rH 

5. 


o 

4^ 

c 

3 

O 

o 


M 

u 

C *-< 

o 

6 

O 

z 


0) 

a 

X 

u 


o 

rH 

ut,' 

CM 


•K 

rH  CM 


O CM 

00  o 


CM  00 
CM  r-{ 


* * 

O 

rM  VO 
CM 


O 

m 


■K  -K 

00  Cl 
rs.  lA 


rH  m VO  ,H 

CM  rH  iH  CM 


P*^  00 
Cl 


rs.  lA 

n 


CM  m rs.  vo 

iH  vD  iH  00 
01  lA 


•«n 

a 


0) 

CO 


M 

O 

VM 


c:) 


& 

c 


a>  3 

iH 

a 0) 
i H 

C/D  CO 


sd*  o 

CM 


O O 

CM  O 


CM  O 


o o 

CM  O 


O O 
CM  O 


CM 


O O 
CM  O 


rH  lA  lA  lA 
CM  CM 


o o o o 

CM  O O O 
rH  lA  lA 
CM 


VH 

O 

c 

o 

tH 

4J 

3 

JD 

•tH 

U 

u 

CO 

•H 

13 

<D 

JH 


cx 


a 

'm 


ui 

O 


<■ 

O 


o 


>4-4 

o 

c 

o 


rH 

3 

•I 


CX  ^ 


(U 

rH 

X) 

(d 

H 

0)  *r-> 

3 a 

M 

H 


OV 

CM 

O 


00  . 

fs% 

CM 


o 

c^ 

o 


lA  lA 
O 00 


00 

lA 

<M 


CM  00 
O 00 


ri  CM 


lA  UI 
stf*  rH  O O 


00  CM 

O O 


Cl 

H 


00  /— s 

lA  Ov 


CO  CM 
H H 
lA  Ul 
CM  CM 


rH  -d" 

H H 
lA  lA 
CM  CM 


lA 

H 


U 

0) 

4J 

3 

a 

B 

o 

u 


M 

rJ 

CQ 

< 

H 


Differs  significantly  (2  standard  errors)  from  that  expected  by  the  noncentral  chi-qaaiared  distribution. 


6. 


tables,  have  J = K = 2,  and  the  p are  given 

jk 

in  the  column  labeled  "True  Table"  of  Table  I.  Notice  that 
T^  and  T2  have  marginal  frequencies  of  50-50  for  the  two 
types  of  act,  while  T^  and  T^  have  a more  unbalanced  90-10 
marginal  distribution.  Table  T^  is  a larger,  4x4,  table 
formed  by  dividing  the  cell  probabilities  of  T^^  through  T^ 
by  4,  and  arranging  them  as  indicated.  The  next  columns  of 
Table  I contain  the  information  measure  I(p),  the  largest 


value  of  iP.,  “ P P 1,  I » simulated  sample  size  n,  the 

smallest  expected  cell  size,  and  the  noncentrality  parameter, 

X = 2n(log^2) I (p) , for  each  simulated  experiment.  For  each 

combination  of  n and  1000  random  samples  from  a simple 

multinomial  (neither  margin  fixed)  were  taken,  and  the  value 
2 

of  G computed.  Computer  generated  uniformly-distributed 
pseudo-random  numbers  were  used  to  generate  the  samples. 


computed  using  a multiplicative  congruentlal  generator.  (See 


Ericson  and  Fox,  1976,  pp.  66-69.)  The  last  five  columns  of 

2 

Table  I give  the  empirical  cumulative  distribution  of  G , 

namely  the  number  of  times  (out  of  1000)  that  the  observed 
2 

G was  less  than  or  equal  to  the  2.5,  10,  50,  90,  and  97.5 

percentiles  of  the  noncentral  chi-squared  distribution  with  the 

corresponding  df  and  X . The  table  entries  marked  with  an 

2 

a s t e r is k ar e t ho se  values  of  the  observed  cdf  of  G which 


differed  significantly  (2  standard  errors)  from  that  of  the 


noncentral  chi-squared  distribution.  As  was  mentioned  above, 

2 

the  upper  tails  of  the  observed  distributions  of  G agree 

well  with  approximation  whenever  there  are  2 or  more  expected 

in  each  cell,  and  every  |Pjk  ~ ^ This  suggests 

that  approximate  lower  confidence  limits  for  X , and  thus  I(p), 

can  profitably  be  found,  using  percentiles  of  the  noncentral 

chi-squared  distribution.  Since  Table  I shows  that  the  lower 
2 

tails  of  G are  not  so  well  approximated  for  these  sample 
sizes,  it  seems  that  rather  larger  expected  cell  frequencies, 
and/or  smaller  values  of  IPjk  ~ Pj  P kl  needed  in  order 

to  trust  upper  confidence  limits  for  X . Fortunately,  in  most 
applications  one  is  more  interested  in  establishing  lower  limits 
for  I (p)  . 


3.  Obtaining  Approximate  Confidence  Limits  for  I(p). 

If  the  noncentral  chi-squared  approximation  to  the  distribution 

2 2 t ti 

of  G is  accurate,  and  if  we  let  X 

percentile  of  the  distribution  with  degrees  of  freedom 

df  = (J  - 1)  (K  - 1)  and  noncentrality  X , then  if  the 
2 

value  G is  observed,  the  solution  to 


is  an  approximate  one-sided  confidence  limit  for  the  true  X 


with  confidence  coefficient  a for  an  upper  bound  . This  is  con- 
verted to  a bound  on  the  information  I(p)  by  the  relation 


I(p)  = X/1.39n. 


If  either  df  or  X is  quite  large,  then  Johnson  (1959) 

2 2 ^ 

shows  that  (y  - df  - X+1)/(2X  + 2X)  is  approximately  a 

standard  normal  deviate.  This  leads  easily  to  the  following 
approximate  confidence  limits  X : 


X 5 = - df  + 1 

2 2 21/2 
X = X . + z ^ + z [2(G^  + X J + z 
a . 5 a a .5  a 


t h 


where  z^  is  the  a percentile  of  the  standard  normal 


distribution.  For  df  and  X only  moderately  large  it  is 


2 2 

necessary  to  solve  the  equation  X directly,  and 


Figures  A and  B are  designed  to  do  this  easily,  for 


a = 2.5%  and  97.5%  respectively.  One  merely  finds  the  value 
.2 


of  G on  the  abscissa,  goes  up  vertically  to  the  curve  labeled 
with  appropriate  degrees  of  freedom,  interpolating  with  respect 


to  df  if  necessary,  and  then  reads  the  value  of  X on  the 

a 


ordinate  scale.  Figure  A is  used  for  an  upper  limit  and 


Lflnoa 


9. 


LRHOB 


10. 


Fig ure 


LOMER  95%  CONPrOENCE  LIMITS  FOR  LflMDfl 

c/f  r 1 4 6 9 12  16 


B-  95%  confidence  intervals  for  the  noncentrality  X , if 

is  an  observation  from  a noncentral  chi-squared  distribution: 

lower  endpoints. 


11. 


Figure  B for  a lower  limit.  The  two  limits  together  form 
an  approximate  95%  confidence  interval  for  X and  are 
converted  to  an  interval  for  I(p)  upon  division  by  1.39n  . 

The  noncentral  chi-squared  percentiles  displayed  in  the  figures 
were  computed  using  Pearson’s  (1959)  approximation  for  most 
values  of  X and  df  , and  a series  expansion  for  the  distribution 
function  (see  Johnson  and  Kotz,  1970,  vol.  2,  p.l32)  in  the  few 
cases  where  Pearson’s  approximation  was  not  accurate  enough  for 
the  figures. 

If  a point  estimate  of  I(p)  is  desired,  one  may  take 

I = X^/1.39n  , which  is  approximately  unbiased  for  large  n , 

1/2 

and  has  approximate  standard  error  [2df  + AX]  /1.39n  . If 
n is  large,  the  difference  between  I and  I(p)  will  not  be 
Important.  However,  if  n is  not  large  compared  to  df  , then 

A 

the  fact  that  I "corrects"  for  df  can  be  important,  at  least 
in  terms  of  mean  square  error  (MSE)  of  the  estimator.  For  example, 
in  the  simulations,  for  the  4^4  Table  T^  (with  df  = 9 , 
n = 20,  and  I(p)  = .458)j  the  1000  simulated  G^’s  had  a 
sample  mean  of  20.5  and  sample  standard  deviation  of  6.4. 

Simple  calculations  then  show  that  /MSE  was  .36  for  I(p) 

but  only  .23  for  I . The  /mSE  would  be  even  smaller  for 

the  estimate  defined  to  be  max(0,I)  , which  is  of  course 

the  estimate  one  would  report  in  practice.  Often  it  is  desired 

to  compare  values  of  I(p)  for  different  populations.  A simple 

intuitive  test  is  to  notice  whether  confidence  intervals  for 


L 


12. 


the  separate  population  parameters  overlap.  Although  this 

criterion  will  be  more  conservative  than  a proper  likelihood 

ratio  test  for  I(p^)  = I(p2)  > it  is  much  simpler.  If  one 

desires  to  estimate  the  common  value  of  I(p)  for  two  or  more 

populations,  one  can  use  the  present  methods  after  adding  the 

2 

corresponding  values  of  G , df  , and  n from  the  various 
samples.  This  is  practically  equivalent  to  averaging  the  various 

.A. 

values  of  I , weighted  by  their  sample  size.  However  this  may 
not  be  as  efficient  as  the  much  more  complicated  maximum  likeli- 
hood procedure  (see  Johnson  and  Kotz,  1970,  p.  136). 

4 . Applications  to  the  Literature 

Some  examples  of  the  application  of  the  distribution  theory 
discussed  above  to  experimental  data  are  presented  in  Table  II 
and  will  now  be  discussed.  It  should  be  noted  that  the  us'6  of 
information  statistics  or  a chi-squared  analysis  is  not  strictly 
appropriate  for  most  of  these  data,  since  there  are  often  too 
many  cellr  with  low  expected  values.  However,  published  tables 
with  sufficiently  high  expected  values  that  depict  information 
transfer  in  animals  are  rare,  and  so  we  have  decided  to  perform 
the  analyses  on  these  data.  Our  results  must  certainly  be  more 

accurate  than  the  heretofore  uniform  practice  of  ignoring  the 

2 '' 
variability  of  G and  merely  assuming  I(p)  = I(p)  . 

(Actually,  only  the  Altmann  (1964)  data,  discussed  next, 

flagrantly  violate  the  rules  of  thumb  established  in  section  2.) 


Approximate 

95% 

Line  l(p)  I Confidence  Species  Source  Remarks  n df  Line 

Limits 


CM 

CO 

lO 

VO 

00 

ov 

O 

a^ 

o^ 

CJN 

uo 

CM 

CM 

CM 

iH 

00 

m 

vO 

VO 

CM 

r>* 

CM 

CM 

CM 

iH 

rH 

CM 

CM 

?H 

iH 

iH 

o 

o 

O 

o 

CO 

m 

CO 

r>. 

CO 

o 

vO 

CM 

m 

ON 

m 

o 

o 

O 

O 

m 

CO 

CO 

1— • 

CM 

1^ 

r— 1 

> 

• 

• 

•H 

> 

C 

1 

•H 

c 

1 73 

• 

a 

•H 

c c 

c 

•H  ‘H 

•H 

o 

c 

a 

a 

CM 

0) 

z 

z 

c 

0) 

O 0) 

o 

0) 

1— j 

» 

rH  0) 

iH  * 

iH 

cfl 

4J 

> 

'O 

C 

0) 

u •U 

•o 

o 

CO 

c 

•H 

0) 

T3 

rH  ^ 

CM 

a 

u 

Q) 

a 

a 

0 

0 

•o 

M 

u 

c 

00 

u 

cd 

0) 

/~s 

cd 

CO 

m 

c 

u 

CO 

/— «« 

vO 

o 

(d 

o 

ON 

<j> 

•H 

•o 

CO 

vO 

U 

ON 

Nw/ 

cd 

CO 

o 

* X 

N-/ 

c 

•H 

c a) 

u 

c 

c 

C 4J 

4J 

d) 

cd 

d 

cd 

<u 

m 

z 

z 

z 

rH 

z 

a 

a 

a <u 

l-^ 

\o 

OO 

4J 

a 

4J  0) 

N 

ON 

C 

o 

CO 

cd 

rH 

•H 

< 

o 

<3 

33 

'w' 

o 

c 

0) 

O -H 

•H  £ 


Xi 

CO 

CO 

CO  CO 

CO 

•H 

M 

3 

3 3 

3 

U 

Cd 

M H 

•H 

a 

>> 

3 cd 

u 

CO 

u 

00  iH 

Cd 

o 

3 

CO 

u *H 

CO 

0 3 

3 

iH 

C 

3 

Cd  c 

z 

3 

Or  U 

cd 

O 

•rH 

u 

*0 

CO 

o u 

ja 

U 

O 

3 

0 'O 

0) 

• 

iH  0) 

•H 

•H 

rH 

00 

3 0) 

>»  a 

•H 

U 

Cd 

Cd 

o u 

3^  O 

O 

U 

u 

o ^ 

CO 

19 

•4“ 

CO 

CO 

m 

07 

o 

iH 

• 

CM 

CO 

CO 

<r 

vD 

• 

00 

• 

rH 

• 

• 

« 

• 

• 

rH 

CO 

00 

CO 

m 

CM 

o 

CO 

CM 

iH 

o 

o 

rH 

CM 

CM 

CO 

<r 

r-. 

lO 

00  ^ 00 

O rH  f-H  <N 


CO 

c^j  CO  m 


00  m 

00  vO 


fH  00  nC  irv  0>f^CO  COOO 

O »H  CO  CO  comvO  or^ 


CS  iH  rH 


o 

rH 

CM 

CO 

m 

rH 

iH 

rH 

rH 

rH 

<!■ 

CM 

CM 

CM 

CM 

CM 

vD 

<1* 

r- 

CM 

r-. 

o 

in 

r- 

vO 

Nd- 

ON 

CO 

CM 

CM 

VO 

CM 

'd* 

>3- 

• 

• 

• 

• 

> 

> 

• 

1 

> 

• 

1 

> 

•H 

•H 

3 

1 

•H 

3 

1 

•H 

•o 

’O 

•H 

3 

•H 

3 

T3 

3 

3 

a 

•H 

3 

a 

•H 

3 

1 

•H 

1 

•H 

a 

•H 

a 

•H 

1 

1 

o 

o 

3 

3 

3 

3 

CM 

o 

3 

in 

o 

3 

•H 

0) 

•H 

0 

rH 

•H 

rH 

•H 

a 

(U 

a 

O 

4J 

x: 

4J 

x: 

3 

> 

CO 

u 

u 

CO 

U 

4J 

4J 

4J 

4J 

u 

Cd 

CO 

•H 

Cd 

CO 

•H 

CO 

o 

CO 

O 

iH 

rH 

> 

iH 

rH 

> 

iH 

rO 

rH 

Xi 

CM 

r*^ 

ON 

iH 

OJ 

00 

3 

•H 

P 

CO 

> 

CO 

CO 

•H 

3 

CO 

rH  CO 

0 ‘H 

rH  3 

U CO 

3 -H 

o o 

3 T3 

r 

z 

z 

3 rH 

: 

•H  <U 

TJ  3 

a U 

O 3 

CO  ,3 

3 •H 

0 a 

, , 

U CO 

o o 

o 

O 

m 

in 

o 

r-. 

vO 

On 

o- 

NO 

m 

• 

• 

• 

• 

• 

• 

1 

00 

1 

rH 

1 

CO 

00 

1 

ON 

CO 

vD 

rH 

o 

CM 

CO 

rH 

m 

r-. 

CM 

00 

in 

in 

CM 

m 

CO 

00 

ON 

O 

m 

rs. 

vO 

• 

in 

I 3 . 


•H  CN 


co  lOvor^  ooo^o 


CM  CO  ^ m 


Table  II.  Application  of  the  method  for  confidence  limitc  to  examples  in  the  literature. 


14. 


Altmann,  in  his  1965  study  of  social  communication  in  rhesus 
monkeys  on  Cayo  Santiago  Island,  reports  on  the  amount  of  infor- 
mation held  in  common  between  adjacent  acts  (line  1).  This  number, 

/N  "k 

I(p)  = 2.01  bits  , is  larger  than  that  reported  for  any  other 
species  except  man.  Altmann  notes  that  his  study  is  beset  by  a 
number  of  methodological  difficulties,  including  the  fact  that  he 
did  not  distinguish  between  Intra-  and  in t er - ind i v id ua 1 sequences 
and  the  fact  that  he  did  not  distinguish  between  various  kinds  of 
behavior,  such  as  maternal,  courtship,  aggressive,  etc.  From  the 
point  of  view  of  assessing  the  significance  of  Altmann's  findings, 
another  difficulty  is  the  fact  that  he  recognized  120  possible  types 
of  acts.  The  results  is  a two-way  table  with  14  400  cells  but 
only  4 571  observations!  A formal  application  of  our  procedure 

/s 

yields  I = .08  with  a 95%  confidence  interval  for  I(p)  of  from 
.03  to  .13.  The  degrees  of  freedom  used  in  the  computation  is 
df  = 12  210  and  reflects  the  fact  that  some  rows  and  columns  of  the 
observed  table  are  entirely  empty  (and  thus  omitted)  because  some 
rare  acts  occurred  only  at  the  beginnings  or  ends  of  sequences. 

However,  before  this  data  could  be  used  to  support  or  reject 
Altmann's  contention  that  rhesus  monkeys  do  communicate,  one  must 

A 

either  develop  a theory  for  the  distribution  of  I(p)  when  most 
cells  are  empty,  or  combine  similar  types  of  acts  to  reduce  the 
degrees  of  freedom  and  increase  the  cell  sizes. 

Using  as  a guide  the  groups  of  acts  suggested  by  Altmann 
(1965,  Table  2)  and  using  the  data  kindly  provided  by  Professor 

•k 

Actually  Altmann  (1965)  reports  1.96  bits  per  act,  but  a recomputation 
of  his  figure  (using  data  kindly  supplied  to  us  by  Professor  Altmann) 
with  a larger  computer  yields  the  figure  2.01.  The  difference  be- 
tween the  two  values  is  negllgable  for  our  purposes. 


T" 


15. 


Altmann,  we  lumped  the  120  acts  into  18  groups.  Groups  1-13  each 
contained  a single  act,  namely  all  those  which  occurred  98  times 
or  more  and  originally  numbered  2-5,  9,  19,  25,  41,  43,  45,  50,  51, 
and  99.  The  remaining  acts  were  placed  into  one  of  the  5 groups 
14-18,  with  rough  meanings  as  follows; 

group  14:  fear,  submission  and  compound  acts  containing 
fearful  and  submissive  elements 
group  15;  attack,  threat,  and  their  compounds 
group  16:  friendly  acts  and  compounds 

group  17;  vocalizations 

group  18:  miscellaneous  --  other  acts  hard  to  group  logically. 

This  grouping  has  no  doubt  lumped  some  dissimilar  acts,  especially  in 
groups  17  and  18,  but  our  purpose  here  is  not  to  propose  a par- 
ticular set  of  group  definitions  for  this  data,  but  merely  to  dem- 
onstrate the  effectiveness  of  grouping  in  general  when  estimating 


I(p) . 

The 

res  ul t ing 

18  X 

18  table  has 

324  cells,  of  which  95 

h ave 

n . , > 

jk  - 

10  , 

100 

cells 

have  2 < n . , 
- Jk 

< 9 , and  129  have 

II 

c 

0 or 

1 . 

Every 

value 

1 ^ Ak  ^ 

of  IPjk-Pj.P 

,1  is  less  than  . 04  . 

. k ' 

The  rules  of  thumb  given  in  section  2 suggest  using  our  approximate 
theory  with  a cautious  interpretation.  The  results  are  given  in 
line  2 of  Table  II:  I(p)  = 1.18,  I = 1.14  , and  1.08  < I(p)  < 1.19 
with  95%  confidence.  Thus  we  find  evidence  to  support  the  value 
1.1  bits  per  act  for  communication  among  rhesus  monkeys,  but  not 
the  value  2.0  reported  by  Altmann  (1965). 


16. 


Hazlett  and  Bossert  (1965)  present  data  on  aggressive 
communication  in  a number  of  hermit  crabs,  some  of  which  is 
presented  in  lines  3-6.  They  assert  that,  while  Cllbanar lus 
tricolor  and  Calcinus  tiblcen  have  roughly  the  same  value  for 
the  information  parameter,  that  for  Pagarus  marshl  is  higher. 
Hazlett  and  Bossert  then  provide  plausible  biological  reasons 
why  information  transfer  is  more  essential  to  Pa . marshl  than 
Cl . tricolor  or  Ca . tlbicen . Although  our  analysis  does  not 
contradict  these  explanations,  inspection  of  the  confidence 
intervals  shows  that  there  is  no  strong  evidence  differentiating 
these  three  species,  although  I(p)  for  Py . operculatus  is  quite 
surely  below  that  of  Pa . marshl . Hazlett  (personal  communication) 
has  been  unable  to  suggest  an  interpretation  of  this  result. 

In  Dingle's  (1969)  experiments  of  aggressive  communication 
in  the  mantid  shrimp,  Gonodactylus  bredlnl,  two  adults  of  the 
same  sex  were  placed  in  a finger  bowl  and  allowed  to  interact 
for  one  hour.  Data  obtained  from  twenty  such  pairings  were 
lumped  together.  Each  60  minutes  of  data  were  broken  into  two 
10-minute  and  two  20-minute  periods.  A separate  transition 
matrix  was  formed  for  each  period.  To  test  stationarity  of  the 
process,  column  marginals  of  temporally  adjacent  tables  were 
compared  using  a chi-squared  test,  revealing  significant 
differences  except  between  the  two  20- minute  tables.  Dingle 
asserted  that  Information  transferred  during  the  second  10-minute 
period  is  significantly  higher  than  any  of  the  other  periods. 
Furthermore,  this  dependence  of  the  Information  statistic  on 
time  was  also  found  in  Gonodactylus  spinulosis  and  in  matches 


17  . 


between  G.  brendlnl  and  G.  spinulosis  (not  shown) . Dingle 
also  found  a decline  in  the  frequency  of  aggressive  acts  in 
G.  bredini,  and  a difference  in  the  types  of  transitions  as 
time  went  on.  He  attributes  these  effects  to  the  establishment 
of  a dominant-subordinate  relation  in  the  species  mentioned 
during  the  second  10  minutes. 

Our  analysis  of  the  four  time  periods,  which  is  presented 
in  lines  7-10  of  table  II,  shows  that  Information  transferred 
during  the  second  period  is  indeed  significantly  higher  than 
during  either  the  first  or  the  last  period,  but  not  significantly 
different  from  the  third. 

Dingle  also  compiled  tables  depicting  transitions  between 
each  individual’s  act,  and  the  next  act  of  the  same  animal. 

It  should  be  noted  that  this  type  of  Intra-indlvidual  analysis 
confounds  two-step  dependence  with  two  steps  of  one-step  dependence 
(Oden,  197"?).  However,  we  have  calculatea  confidence  intervals 
for  these  data  and  present  them  in  lines  11,12  and,  for  G.  spinulosis , 
i: 

In  calculating  the  information  statistic  for  G.  spinulosis 
and  matches  between  G.  spinulosis  and  G . bredini , Dingle  (1972) 
followed  essentially  the  same  protocol  as  for  G . bredini . The 
results  are  reported  in  lines  13-15. 

Dingle  observed  that,  generally,  values  for  mantid  shrimp 
exceed  those  for  hermit  crabs.  He  suggests  that  this  might 
reflect  the  greater  seriousness  of  physical  combat  in  the  mantid 
shrimp  . 


18  . 


Although  the  two  independent  values  of  I between  Individual 

A 

mantid  shrimps  (lines  7 and  14)  are  each  higher  than  all  4 I's 
from  experiments  with  hermit  crabs  (lines  3-6),  this  in  itself 
is  not  much  support  for  Dingle's  interpretation,  since  a Mann- 

A A 

Whitney  U-test  on  the  ranks  of  the  I or  I(p)  statistics 
for  mantid  shrimp  and  hermit  crabs  fails  to  reveal  a significant 
difference  between  the  two  classes  (a  = .13,  two-tailed)  . However, 

the  fact  that  there  is  only  one  overlap  in  the  95%  confidence 
intervals  between  the  two  groups  (G.  spinulosis,  line  14,  and 
P.  marshi,  line  6),  while  there  is  much  overlap  within  the  hermit 
crabs,  tends  to  confirm  his  assertion. 

5 . Acknowledgements 

We  would  like  to  thank  Brian  Hazlett  and  Llen-Ju  Chao 
for  comments  on  an  earlier  version  of  this  work,  Stuart 
Altmann  for  the  use  of  his  data,  Herman  Chernoff  for  comments 
on  the  proof  in  the  appendix,  and  Mary  Coffey  for  some  help 
with  the  computer  programming. 


REFERENCES 


Altmann,  S.  A.  (1965).  Sociobiology  of  Rhesus  Monkeys  II: 
Stochastics  of  Social  Communication,  J_^  Th . Biol. 

8:490-522. 

Bishop , Feinberg  and  Holland  ( 1 9 7 5 ).  D i s c r e t e Multivariate  Analysis, 
M.I.T.  Press,  Cambridge,  Mass. 

Broffit,  J.  D.  and  Randles,  R.H.  (1977).  A Power  Approximation 
for  the  Chi-Square  Goodness  of  Fit  Test:  Simple  hypothesis 
Case.  JASA  72:604-607. 

Chatfield,  C.  and  R.  E.  Lemon  (1970)  Analysing  Sequences  of 
Behavioral  Events.  Th . Biol  . 29:4  27-455  . 

Cronbach,  L.  J.  (1955)  On  the  non-rational  application  of 
information  measures  in  psychology.  In  H.  Quastler  (ed) 

Informa  t io  n Theo  r y in  Psychology : 14-26.  Free  Press, 

Glecncoe  , 111  . 

Davidson  R.  R.  and  Leve,  W.  E,  (1970)  The  limiting  distribution 
of  the  likelihood  ratio  statistic  under  a class  of  local 
alternatives.  Sankhya  Ser.  A 32:209-224. 

Dingle,  H.  A.  (1969).  Statistical  and  Information  Analysis  of 
Aggressive  Communication  in  the  mantis  shrimp  Gonodac  tyl us 
bred  ini  Manning.  Anim . Behav . 1 7 : 561-579. 

Dingle,  H.A.  (1972).  Aggressive  Behavior  in  Stomatopods  and 
the  Use  of  Information  Theory  in  the  Analysis  of  Animal 
Communication.  In  H.  E.  Winn  and  B.  L.  011a  (eds),  Behav ior 
of  Marine  Animals:  Current  Perspectives  in  Research,  vol . 1, 
Invertebrates:  126-156.  Plenum  Press,  N.  Y, 

Ericson,  W.  A.,  and  Fox,  Daniel  J.  (1976).  Simulation  with 
MIDAS . Statistical  Research  Laboratory,  University  of 
Michigan,  Ann  Arbor,  Michigan  48109 

Feder,  P.  I.  (1968).  On  the  distribution  of  the  log  likelihood 
ratio  test  statistic  when  the  true  parameter  is  "near"  the 
boundaries  of  the  hypothesis  regions.  Annal s Ma  t hemat Ic  s 
Statistics  39:  2044-2055. 

Haldane,  J.  B.  S.  and  Spurway,  H.  (1954).  A statistical 

analysis  of  communication  in  Apis  melllfera  and  a comparison 
with  communication  in  other  animals.  Insects  Soclaux,  1(3): 
247-283. 


20. 


Hazlett,  B.  A.  and  W.  H.  Bossert.  (1965).  A statistical  analysis 
of  the  aggressive  communication  systems  of  some  hermit  crabs. 
Anim.  Behav. , 13:357-373. 

Hazlett,  B.  A.  and  W.  H.  Bossert  (1966).  Additional  observations 
on  the  communication  systems  of  hermit  crabs.  Anim.  Behav. 
14:546-549. 

Johnson,  N.  L.  (1959).  On  an  extension  of  the  connection  between 
Poisson  and  x ^ is  t r ibu  t io  ns  . B iome  t r ika  46  : 352-363. 

Johnson,  N.  L.  and  Kotz,  S.  (1970).  Continuous  Univariate 

Distributions,  Vol.  2.  Houghton-Mif f lin  Company,  Boston. 

Kullback,  S.  (1959).  Information  Theory  and  Statistics,  New  York: 
John  Wiley  and  Sons. 

Oden,  N.  L.  ( 1977  ).  Partitioning  Dependance  in  Nonstationary 

Behavioral  Sequences.  In  B.  A.  Hazlett  (ed.)  Quant  it  at ive 
Methods  in  the  Study  of  Animal  Behavior.  203-220.  Academic 
Press. 

Pearson,  E.  S.  (1959).  Note  on  an  approximation  to  the  distribution 
on  non-central  • Blometrlcka  46:364. 

Wald,  A.  (1943).  Test  of  statistical  hypotheses  concerning 

several  parameters  when  the  number  of  observations  is  large. 
Trans.  Amer.  Math.  Sod.  54:426-482. 

Wilks,  S.  S.  (1938).  The  large  sample  distribution  of  the 

likelihood  ratio  for  testing  composite  hypotheses.  Ann . 

Math . Stat . 9:60-62. 

Wilson,  E.  0.  (1962).  Chemical  communication  among  workers  of 

the  fire  ant  Solenopsls  saevisslma  (Fr.  Smith):  2,  an 

information  analysis  of  the  odor  trail.  Anim.  Behav. 

10(1,2) :134-164. 


-r* 


APPENDIX 


The  noncentral  chl-squared  approximation  to  the  distribution  of 
the  likelihood  ratio  statistic . 

This  appendix  will  show  that  the  definition  of  the  non- 
centrality  parameter  A in  Wald's  (1943)  proof  of  the  asymptotic 
distribution  of  the  likelihood  ratio  statistic  is  the  same^  for 
n large  and  I(p)  small,  as  A(p)  = 2 n log^2  I(p)  , where 

■ ] A "JX  “’*2  ■’jk'Pj.P.k  • 

Let  X be  a discrete-valued  random  vector  with  probabilities 
p(x;9)  . The  parameter  0 belongs  to  a subset  0 of  Euclidian 
space  E*^  and  is  unknown.  Let  g(*)  be  a function  over  the 
parameter  space  which  takes  values  in  E ( ^ 9)  and  has 
continuous  first  partial  derivatives.  To  test  the  hypothesis 
g(0)  = 0 versus  g(0)  ^ 0 using  a sample  of  n independent 
observations  of  X , consider  the  likelihood  ratio  statistic 

n 

2 " 

G = 2 log[sup  n p(x  ;e)]/[sup  n p(x,;0)] 

eeO  i=l  ^ g(0)=o  i=l  ^ 

Suppose  ^ sequence  of  alternatives  converging  at  rate 

-1/2 

n to  a point  0^  such  that  g(0Q)  = 0 and  the  matrix 

-1/2 

3g(0Q)/90  is  of  full  rank  , l.e.,  ^ ^ 

for  some  fixed  vector  C . Then  Wald's  (1943)  theorem  IX 

2 

states  that  the  limiting  distribution  of  G is  the  noncentral 
chl-squared  distribution  with  r degrees  of  freedom  and  non- 
centrality 


(1) 


X = 11m  n[g(6^)]'  ^ ’ 

n->-oo 


where  ^(Q)  is  the  asymptotic  covariance  matrix  of  the  quantity 
g (0)  - g(0)  ] as  n->-<»  with  9 fixed  and  0 denoting 

the  maximum  likelihood  estimator  of  0 based  on  sample  size  n . 

Now  the  quantity 

(2)  6(0^)  = [g(e^)  ] ' ^ 

may  be  interpreted  as  the  squared  distance,  in  E*^  , between 

the  point  0 and  the  nearest  point  on  the  surface  {0tg(O)  = 0}  , 
n 

where  the  metric  used  to  define  distance  is  that  given  by  the 
Fisher  information  matrix  H(0  ) , defined  by 


H(0)  = ElOlog  p(x;0)/a0)(9  log  p(x;0)/9e)’] 

-1  1/ 2 '■ 

H (0)  is  the  asymptotic  covariance  matrix  of  n 0 , and 

J](0)  = g H ^(0)g'  , where  = {9  ^ ' l,...,r 

and  m = l,...,q  . It  is  well  known  (see  e.g.,  Kullback  (1959, 

p.  26-28)  that  Fisher  information  is  a limiting  form  of  the 

Kullback-Le ibler  Information  measure:  namely  for  two  points 

0Q  , 0^  close  togehter,  (0^  - 0 q)  ' H (0  ^ ) (0  - 0^)  = 

2 I p(x;0^)log^p(x;0^)/p(x;0^)  + O(|0j^-0q|^)  . 

g(0.)  = 0 , the  minimum  distance  6(0  ) is 
u n 


Thus , if 


23. 


(3) 


(6  ) = min 

" {e:g(0)=O} 


2 I p(x;0^)loggp(x;6^)/p(x;6)  + 

X 


In  our  case,  where  x = (j,k)  , p(x;0)  = p.,  , and 

J 

{0:  g(0)  = 0}  s {{?•!,}  • P^i,“P.  ? 1,  = 0}  it  tan  be  checked 

J K J K j . , R 

that  the  sum  on  the  right  of  (3)  is  minimized  by  p.,  = pf'^^p/'^^ 

J k j . •K 


so  that 


-SOn)  = <S({p{k^}) 


9 V 1 -3/2, 

2 Pjk  /Pj_  P ^ + 0(p  ' ) , 


(n)  , (n)  _ (n) 


assuming  that 

(0) 

— T. 

j . 

we  have 


p<°'p;°>  i 0 


' "jk^  * 'jk  "■ 
for  all  j ,k  , 


1/2 


X = lira  2 n(log2)I(p 

n->oo 


(n) 


and  that 
Thus,  combining  (1)  - (4)  , 

as  desi red . 


SECURITY  CLASSIFICATicn  OF  THIS  PAGE  (Whtn  Datik  Enla.-efI) 


REPORT  DOCUMENTATION  PAG£ 


RF.AD  INSTRUCTTOr'i 
nEI-ORE  COMPI.FTINO  I'-ViRM 


t,  report  >1A*«3£R 


'<<(W)h-^ --L<2i I ' 7? 

title  I IR>  l■l^■HFPlll»J-y^amD  r. 

Estimating  the  Information  Parameter  of  a Two-IkH  Technical  ^epait.  / 
Way  Table  with  Applications  to  Animal  r"^- ~ 

Communication,  -■ — r ® performing  org.  report  w. 


2.  GOVT  ACCESSli^  nC 


Uo  f Wi 11  am  H ./ DuMouchel  / 

7 Neal /Oden  ■(lfrETveTTT?y-0f  Michigan) 

9.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

Department  of  Mathematics 
M.I.T.,  Cambridge,  Mass.  02139 


! 6-  PERFORMING  ORG.  REPORT  N'JMOER 


6.  CONTRACT  OR  grant  NUMatRr*; 


10.  program  element,  project.  TASrr 
AREA  4 WORK  UNIT  NUMBERS 


(NR-042-331 ) 


11.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 


Office  of  Naval  Research 

Statistics  & Probability  Program  Code  436 


12.  report  date 

January  30.  1978 

13.  NUMBER  OF  PAGES 

23 


"u  MONITORING  AGENCY  NAME  4 ADORESS('// c//f/eren*  Uom  ControUing  OUlce)  15.  SECURITY  CLASS,  (ot  thto  rt-port’ 

Unclassified 

ISa.  DECLASSIFICATION  DOWNGR-OIN  n 
SCHEDULE 


16.  DISTRIBUTION  statement  fo/  this  Report) 


Approved  for  Public  Release;  Distribution  Unlimited 


[ 17.  DISTRIBUTION  STATEMENT  (of  tho  mbstrect  erxtermd  in  Block  20,  If  different  from  Report) 


18.  SUPPLEMENTARY  NOTES 


19.  KEY  WORDS  (Continue  on  reverse  side  It  neceesmry  mtd  Identify  by  block  number) 

Information,  Shannon  measure,  animal  communication,  non-central 
chi-squared  distribution,  two-way  table,  confidence  limits 


I 20.  ABSTRACT  (Continue  on  reverse  elde  It  neceeemry  end  Identity  by  block  number) 


(see  reverse  side) 


DD  1473  EDITION  OF  t NOV  85  IS  OBSOLETE 

S/N  0102*014-  6601  i 


UNCLASSIFIED 

security  CLASSI^CATION  or  TMiS  f\ 

/kZil. 


SRCURITY  classification  of  this  FACEIUTiwi  Dmit  Enfnd) 


